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SPECIAL ASTICLES 

ON THE COEFFICIENT OF CORRELATION AS A 

MEASURE OF RELATIONSHIP 

The theory of correlation deals with the rela- 
tionship between variable quantities in the case 
where that relationship lies somewhere be- 
tween functional dependence and complete 
independence. In the case of normal corre- 
lation for two variables a certain quantity r, 
which is zero for complete independence and 
rt 1 for functional dependence, plays an impor- 
tant role. The formula for r, in terms of n 
observed pairs of values of two variables x and 
V, is 



(1) 
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% (Xi - xo) (yi - yd) 
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where x„ is the mean of the z-values and y„ the 
mean of the ^/-values. 1 This formula has also 
been given an interpretation for the case of 
skew correlation 2 which makes r an important 
quantity in many instances of such correlation. 
The quantity r is usually termed the coeffi- 
cient of correlation and is said to measure the 
amount of correlation between the variables 
x and y. This latter statement is too vague as 
it stands for scientific procedure, so it is de- 
sirable to state more precisely what is meant 
by it. In the case of normal correlation r has 
been shown to have the following significance : 3 
if we take the mean of all the y's corresponding 
to a given value of x, then the deviation of 
this mean from the mean of all the y's, divided 
by the standard deviation of the y's, is equal 
to r times the deviation of the given a;-value 
from the mean of all the x's, divided by the 

1 Cf. Pearson, "Eegression, Heredity and Pan- 
mixia," Philosophical Transactions of the Royal 
Society, 187 A (1896); also Bravais, "Analyse 
math^matique sur les probability des erreurs de 
situation d'un point," Academie des Sciences: 
Mtooires pr&entes par divers savants, Ser. 2, Vol. 
9 (1846). 

2 Cf. Yule, ' ' On the Significance of Bravais 's 
Formulas for Eegression, etc., in the Case of Skew 
Correlation," Proceedings of the Soyal Society, 
Vol. 60 (1897). 

s Cf. Pearson, I. c. 



standard deviation of the x's. Thus r may be 
said to measure the tendency of a given devia- 
tion from the mean in one of the variables to 
be associated with an average deviation from 
the mean of corresponding magnitude in the 
other variable. 

It is clear that the value of r throws much 
light on the relationship between two variable 
quantities in the case of normal correlation. 
It is not apparent, however, that it gives us 
in every instance the information we are most 
interested in obtaining, and it will be shown 
in what follows, that in certain cases of inter- 
est in the applications of the theofy of corre- 
lation it will not necessarily give it. 

The formula (1) is well adapted to the com- 
putation of r from observed values of x and y. 
For our purposes, however, we need a formula 
which exhibits r as a function of the under- 
lying variable quantities that determine x and 
y and the relationship between them. We shall 
now proceed to obtain such a formula on the 
basis of assumptions similar to those that 
Pearson used in his derivation of (l). 4 

Let 

X = fx (e t , e 2 , • • • , e OT ) > 
y = —f2\ e i, e a , •.., € m)t 



(2) 



where the e's are independent variables that 
follow a Gaussian distribution, and the /'s are 
analytic functions. If we expand the right- 
hand members of (2) about the mean values 
of the e's and neglect higher powers than the 
first, 5 we have 

X — x„ = Oxi)lx + <b*l2 + ■•■ + (hmi\m, 



(3) 



/ — y a — O^xVi + ^k^s + • • • '+ OimVm, 



where the t/s are deviations of the e's from 
their mean values and x and y are mean 
values of x and y, respectively. 

Since the e's are independent variables fol- 
lowing a Gaussian distribution, we have 

*L. c. 

» Pearson assumes that the variations of the e's 
from their mean values are small in comparison 
with those values, in order to justify the dropping 
of higher powers. It is more general to assume 
merely that for the range of values of the e's con- 
sidered, the f's are sufficiently good approxima- 
tions to linear functions to warrant the neglect of 
higher powers. 
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where 

(w', i/), Ow", i/0, • • ■, (w w , vfi*) 

are w pairs of values of -qi and ijy. Hence, sub- 
stituting in (1) the values of (x — x„) and 
{y — y ) given by (3), we obtain 



(4) 
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where the s's are the standard deviations of 
the e's. The formula (4) for r is well adapted 
to the discussion of the connection between 
the value of r and the relationship between 
x and y. We shall use it first to show that 
under certain conditions r will not furnish a 
satisfactory measure of the particular form 
of relationship in which we are interested. 

Consider, for example, the use of correlation 
in educational investigations. A value for r is 
computed from the performances of a group 
of persons in two fields of mental activity, 
such as two school subjects, and the closeness 
of relationship between the two fields or sub- 
jects is discussed on the basis of this value. 
It is clear that the value of r is a good measure 
of the tendency of the members of the group 
having a given deviation from the mean ability 
of the whole group in one field, to have an 
average deviation of corresponding magnitude 
from the mean ability of the whole group in the 
other field. It is certainly useful to be able to 
measure such a tendency, but there is some- 
thing else which it is more useful from the 
educational standpoint to be able to measure. 
Suppose the average ability of the whole 
group in one field is increased a certain 
amount by training in that field, and this in 
turn causes a certain increase in the average 
ability of the whole group in the other field. 
The ratio of this latter increase to the former, 
when each is measured in terms of the stand- 
ard deviation of the group in the correspond- 
ing field, is a very important quantity in edu- 
cational investigations; it is vital for example 
in the discussion of such questions as disci- 
plinary values. 



We will now proceed to show that under cer- 
tain conditions this ratio may be much greater 
than r. Since ability in any complicated field 
of mental activity like a school subject may 
be regarded as a function of a great many ele- 
mentary abilities, the abilities x and y in two 
subjects may be represented as in equation (2) . 
If we expand about the mean values of the e's 
at any given time and neglect higher powers 
than the first, 6 we get equations of the type (3). 

Since ability in each of the two subjects will 
in general depend on certain elementary abil- 
ities not involved in the other, we shall con- 
sider a case where certain of the a's in the first 
equation in (3) are zero and certain of the a's 
in the second equation are zero. Let us sup- 
pose then that 

(in (Z±z — • • • — &ip — U, 

^2) m - + i =z ^21 m _ p + 2 ^^ • • • :==: Chm -—- 0j 



(5) 



"aj m - -p + i - 



and let us suppose further that 

* — Ctj t m—p ™ 0> J^ U 



(6) 



&1, wi—p-fl ~ $1, m— p+2 — 

aa — On= • • • - 
g, = S, = ... = g 



U = 1, 2), 
• • • = aim = 100a, 
az v = 100a, 



m = 902p. 

If by training in one subject the average 
ability of a group of persons in that subject is 
increased a certain amount, it is reasonable to 
suppose that this increase has been uniformly 
distributed in the way of corresponding in- 
creases in each of the elementary abilities in- 
volved in that subject. Since from (6) the 
standard deviations of the elementary abilities 
are all equal in the present case, a uniform 
distribution of the increase would imply an 
equal increase in each elementary ability. We 
will assume then that after training in the 
subject, the mean value of each e of which x 
is a function is increased by a quantity 8. 
Since the ij's occurring on the right-hand side 
of the first equation in (3) are deviations from 
the original means of the e's, the mean value 
of each of them will now be 8 instead of zero. 

6 In the present instance we neglect higher pow- 
ers on the assumption that ability in the given sub- 
ject is approximately a weighted mean of the ele- 
mentary abilities on which it depends. 
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Hence, in view of (5) and (6), the mean value 
of x will be 

(7) x '=x t> + 1,000 p a 5. 

Similarly the mean value of y after the in- 
crease in the average value of the e's involved 
in x, the e's involved in y but not in x remain- 
ing constant, will be 

(8) y ' = y + 900 pa S. 

Therefore, since the standard deviations of x 
and y, s x and s„, are equal, 



Vo -: 



o l%o' ■ 



■ x 



0.9. 



5y / Sx 

It is apparent from (9) that in this instance 
a certain increase in the average ability x 
will be accompanied by an increase almost as 
great in the average ability y. 

If r is to be considered in all cases a reliable 
measure of the closeness of relationship be- 
tween two fields of mental activity, it ought to 
be approximately equal to the ratio in (9). 
Let us see what its value actually is. Making 
use of equations (4), (5) and (6), we get 

= 900pa 2 

(10) V(900pa 2 + 10,000pa 2 ) (10,000pa 2 + 900pa 2 ) 

= 0.08 approximately. 

We have dealt here with a special case, but 
it is easy to see from the above discussion that 
in many other cases we would have discrep- 
ancies of the same sort. Hence it is apparent 
that it is not safe to assume off-hand that r is 
always the best measure of the relationship 
between two fields of mental activity. It may 
be a very poor measure of the form of rela- 
tionship in which we are interested. 7 

The question naturally arises, under what 
conditions will r be a good approximation to 

7 We have restricted ourselves in the foregoing 
discussion to the case of relationship between dif- 
ferent fields of mental activity. The mathematical 
part of the discussion, however, will undoubtedly 
have a bearing on many applications of the theory 
of correlation. If for any two variables x and y, 
the a's of equation (3) satisfy the conditions of 
our special ease, the ratio of .the common factors 
involved in the variation of x and y to all the fac- 
tors, will, for each variable, be 0.9. Hence r, which 
is given toy (10), will not be a good measure of the 
closeness of relationship between the two variable 
quantities. 



the value of the ratio in (9) ? It is the pur- 
pose of the rest of this paper to obtain certain 
sufficient conditions that this will be the ease. 
It is very easy to see that if all the a's of equa- 
tion (3) which are not zero are equal to each 
other in absolute value, and furthermore if 
the standard deviations of the e's are all equal 
to each other, r will be exactly equal to the 
ratio in (9). This leads one to suspect that 
if these conditions are fulfilled to a sufficient 
degree of approximation, r will not differ very 
much from this ratio. 

In discussing the general case there are 
really two ratios of the type (9) to be con- 
sidered, according as the training has been in 
the field corresponding to x or in the field 
corresponding to y. In the special case dis- 
cussed above these two ratios were identical, 
so we only considered one of them. Under the 
hypotheses we shall make in what follows, the 
discussion for one ratio is practically the 
same as the discussion for the other, so here 
too we shall only consider one of them. 

We will investigate first the case where all 
the a's on the right-hand side of the equations 
in (3) are positive or zero. It is apparent that 
there is no loss of generality in supposing that 
the a's which are zero in the first equation are 
the a's of the first p terms and the a's which 
are zero in the second equation are the a's of 
the last q terms. In particular p, or q, or both 
of them, might be zero. 

Since the standard deviations of the e's in- 
volved in x are no longer necessarily equal to 
each other, a uniform distribution over these 
e's of an increase in x would result in an in- 
crease in each e proportional to its standard 
deviation. Let us suppose then that after 
training in the field corresponding to x the 
mean value of each e v involved in x has been 
increased by an amount s„8. Representing as 
before by x ' the mean value of x after the in- 
crease in the e's we have 

(11) Xo' — Xo= 2 aivSzS. 

Similarly, if y„' represents the mean value of y 

after the increase in the e's, we have 

v=m— q 

(12) yo' ~y = 2 a2»s„5. 

v=p+l 
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Let us now suppose that two positive quan- 
tities a and s, and a positive quantity P < 1, 
exist, such that 


I v—m 

!v=rn—q 
-1/ 2 O2 V 2 S 



a(l-p)^anSa(l+p) (i = p+l,p+2, • • -,to), 
(14) ffi(l-p)^02i^a(l+p) (» = 1) 2, •••,m-g), 
s(l- P msi ^«(l+p) (* = 1,2, ..-.w). 

It follows readily from (13) and (14) that 
n-p-q 



(15) 
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/ 1 + p V m-p - q 



' P / \(to — p) (to — g) 
Similarly from (4) and (14) we have 



(16) 
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P > V(to — p) (m — q) 

(ktfV. 
u-p/ 



< r 



< 



Let us now suppose that the a's and the s's 
satisfy equations of the type (14), i. e., equa- 
tions obtained by replacing the a's in those 
equations by their absolute values. Then it is 
easy to see that if X > /x, and P is sufficiently 
small, 

X-m (1 + p 2 )(1-p) 2 



■\(m — p)(m — g) 

We might obtain still narrower limits for 
the values of B and r than those given in (15) 
and (16). It is apparent from the limits ob- 
tained, however, that if P is sufficiently small, 
r will furnish a good approximation to the 
value of B. 

We will now consider the case where some 
of the a's on the right-hand side of the equa- 
tions in (3) are negative. Let us suppose that 
the first A of the (m — p — q) -q's. that appear 
in both equations have coefficients of the same 
sign in the two equations, and that the re- 
mainder, fj, in number, have coefficients of 
opposite signs. Obviously, an increase in x 
that is uniformly distributed with regard to 
the t's involved in x, will be accompanied by a 
decrease in those e's for which the correspond- 
ing rfs have negative coefficients in the first 
equation in (3) ; also an increase in an 77 
having a negative coefficient in the second 
equation will cause a corresponding decrease 
in the value of y. Hence we have for the 
ratio in (9) 



■4{m-p){m-q) (1 + p) 4 

2(X + m) p(1-p) 2 



<R 



(18) 



4 (to - p) (to - q) (! + p) 4 

\-M (l + ^a + p) 2 



V (to — p) (to — q) (1 P> 

+ 2(X + m) . p(1 + p 2 ) 

V (to — p) (to — q) (1 ~ Pr 

Furthermore, in view of (4), we have for r 
\-u _l + 6p 2 + p 4 

<{m-p){m-q) ' C 1 + p) 4 

4(X + M ) „ . pO+P 2 ) < f 
V(to — p) (to — g) (* + p) 4 

X-M 1 + 6p 2 + p 4 

(1 - p) 4 

p(1 + P 2 ) 



(19) 



< 



V (to — p) (to — g) 

4(X + M ) 



+ ■ 



V (to — p) (to — g) (- 1 p) 4 

The corresponding inequalities for the cases 
where A^/t are easily obtained. It follows- 
from (18) and (19) or the corresponding in- 
equalities, that r will be a good approximation 
to B if P is sufficiently small. 

The case where all the a's on the right-hand' 
side of (3) that are not zero, are negative, 
does not seem to have any great interest in 
connection with the applications discussed in 
this paper. In any event the treatment of that 
case presents no new difficulties, so we shall 
not consider it here. 
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This paper makes no pretense of being an 
exhaustive treatment of the subject under 
consideration. Its main object has been to 
point out as briefly as possible the danger of 
assuming that the coefficient of correlation is 
necessarily a satisfactory measure of all forms 
of relationship between two variable quantities, 
and at the same time to suggest a method of 
attack for determining in what way a partic- 
ular relationship depends on the value of this 
coefficient. Charles N. Moore 

Universitt op Cincinnati 

an aberrant ecological form of unio 
complanatus dillwyn 

The variety of Unio complanatus Dillw. 
which is here described was found at Songo 
Pond, about three miles south of Bethel, Me. 
The specimens from which it is described were 
collected in August, 1913. The pond is a 
headwater of the Crooked River, one of the 
larger tributaries of the Presumpscot. It lies 
in a glacial scoop in alluvial sand, and 
is fed by springs mainly. A small brook a 
mile long enters it also. The country rock is 
a granitic gneiss of the eastern range of 
Montalban gneisses, and the intrusive granites 
scattered here and there are of the same min- 
eralogy. There is no limy rock in any form 
within many miles, a fact which will account 
for the peculiar structure of the shell. The 
specimens were picked up on a very gently 
sloping beach of round-grained sand, along the 
western shore of the pond, and in about two 
feet of water. The pond is about a mile and a 
quarter long, from north to south, and aver- 
ages a quarter of a mile in width. 

So far as I can determine, the soft parts of 
the animal are in every way normal for the 
species. The aberrancy occurs in the valves, 
and is in structure and in shape. 

The largest of my specimens, and the largest 
I have seen in the course of eight summers' pick- 
ing, measures two and three quarters by one 
and a half inches over all. The greatest thick- 
ness, from umbo to umbo, is three quarters of 
an inch. The following features are normal: 
hinge size and place, umbo size, place and 
shape, lateral and pseudocardinal teeth size 
and shape, scars, pallial line, and sculpture. 



Epidermis is of normal color, but thicker than 
usual, and overlaps the edge of the hard part 
of the shell up to 3/32 of an inch, being most 
extended at the siphonal region and along the 
anterior part of the ventral edge in many speci- 
mens. 

The shape of the shell is almost identical 
with that of Anodonta marginata Say, being 
roughly rhomboidal. It does not resemble the 
specimens of Unio complanatus from other 
regions in the American Museum, at New 
York, in this respect. From the posterior end 
of the hinge, the dorsal edge slopes ventrally, 
straight, at an angle between 35 and 40 de- 
grees from the line of the hinge. This por- 
tion of the edge is nearly straight and about 
as long as the hinge. It rounds off into the 
small semicircle of the posterior end. In ma- 
ture specimens there may be a slight flatten- 
ing of the posterior end at the point where the 
mantle forms a pair of siphons by its folding 
and coherence, but this is not constant and I 
find it only in the largest specimens. The 
ventral edge is not a uniform curve, but ap- 
proaches more or less to three straight lines, 
equal in length, each making an angle of 
about ten degrees with the line continuing the 
edge beyond it. The anterior end has the 
usual graceful elliptical outline, forming a 
large curve from hinge to ventral edge. 

There are no rays visible on any of my 
specimens. 

The most peculiar feature of the shell is the 
exceedingly small amount of mineral matter 
in it. When fresh the shells are horny and 
somewhat flexible, not unlike two layers of 
parchment pasted together, in texture. Alco- 
holic material and fresh are alike easily cut 
with a small shears, and there is no cracking. 
The thin nacreous layer breaks into small 
angular chunks, which adhere to the epidermis. 
I found only the faintest traces of a prismatic 
layer, in the largest specimens. Smaller ones 
fail entirely to show it. In my largest speci- 
mens there is at the umbo a larger amount of 
mineral matter, but even here it is hardly more 
in amount than at the margin in the normal 
shell of this species. The epidermis seems to 
me to be nearly twice as thick as in the normal 
type. In many specimens I found grains of 



